-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control the number of rows returned by SelectResult #9273
Conversation
SelectResult
SelectResult
93e996d
to
fde0c6a
Compare
Codecov Report
@@ Coverage Diff @@
## master #9273 +/- ##
==========================================
+ Coverage 67.22% 67.24% +0.01%
==========================================
Files 371 371
Lines 77271 77283 +12
==========================================
+ Hits 51949 51968 +19
+ Misses 20682 20677 -5
+ Partials 4640 4638 -2
Continue to review full report at Codecov.
|
util/chunk/recordbatch.go
Outdated
// requiredRows indicates how many rows is considered full for parent executor. | ||
// Child executor can return immediately if there are such number of rows, | ||
// instead of fulling the whole chunk. | ||
// This is not compulsory, so the number of returned rows can be larger than it in some cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
// requiredRows indicates how many rows is required by the parent executor.
// Child executor should stop populating rows immediately if there are at
// least required rows in the Chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments have been updated.
util/chunk/recordbatch.go
Outdated
} | ||
|
||
// IsFull returns if this batch can be considered full. | ||
func (rb *RecordBatch) IsFull(maxChunkSize int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove maxChunkSize
from the interface? Only check whether rb.NumRows() >= rb.requiredRows
? we should set rb.requiredRows
according to max chunk size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only checking requiredRows
is dangerous and not convenient.
I prefer to take both requiredRows
and MaxChunkSize
into account in this function.
How about introducing sessionctx.Context
into RecordBatch
.
Then the IsFull
will behave like:
IsFull() bool {
numRows >= ctx.GetSessionVars().MaxChunkSize ||
(requiredRows != UnspecifiedNumRows && numRows >= requiredRows)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is requiredRows
determined by maxChunkSize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiredRows
should be equal or less than maxChunkSize
, but it can be Unspecified
.
When it's Unspecified
, it's dangerous to always return false
if it ignores maxChunkSize
.
And in each place where IsFull
is used, it is not convenient to check both IsFull
and maxChunkSize
like:
if !batch.IsFull() && batch.NumRows() < ctx.GetSessionVars().MaxChunkSize { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found there are some executors have their own maxChunkSize
instead of using global maxChunkSize
in sessionctx.Context
, so the approach mentioned above is not appropriate.
I will remove maxChunkSize
from IsFull
and let the outside scope where IsFull
is called check it.
} | ||
|
||
// SetRequiredRows sets the number of rows the parent executor want. | ||
func (rb *RecordBatch) SetRequiredRows(numRows int) *RecordBatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allocate a RecordBatch
with zero default values, rb := &RecordBatch{}
, and without calling SetRequiredRows()
, 0 is not treated like unspecified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnspecifiedNumRows
has been updated to zero value of int
.
chk.Reset() | ||
maxChunkSize := r.ctx.GetSessionVars().MaxChunkSize | ||
for chk.NumRows() < maxChunkSize { | ||
return r.NextBatch(ctx, chunk.NewRecordBatch(chk)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about setting the required rows of that newly created batch to max chunk size?
util/chunk/recordbatch.go
Outdated
} | ||
|
||
// IsFull returns if this batch can be considered full. | ||
func (rb *RecordBatch) IsFull(maxChunkSize int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the relationship between the param maxChunkSize
and the member variable requiredRows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxChunkSize
has been removed from this function now.
// SetRequiredRows sets the number of rows the parent executor want. | ||
func (rb *RecordBatch) SetRequiredRows(numRows int) *RecordBatch { | ||
if numRows <= 0 { | ||
numRows = UnspecifiedNumRows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If numRows
<= 0, then set the requiredRows
to ctx.MaxChunkSize
?
037fa57
to
973b698
Compare
What problem does this PR solve?
Control the number of rows returned by
SelectResult
.This PR is a subtask of #9166.
What is changed and how it works?
Add
requiredRows
intoChunk
to specify how many rows the parent executor want.And update
selectResult
andstreamResult
to support this feature.Check List
Tests
Code changes